WO 03/096184

1/PRT>

PCT/DE03/01540 DT01 Rec'd PCT/PTO | 3 OCT 2004

## Method and arrangement for the power-efficient control of processors

5 The invention relates to a method for functionally controlling the program and/or data flow in digital signal processors and processors having respective closed modules which are separate from one another, are intended for program and data flow control and operate in parallel arithmetic units.

Processors whose architecture has a slice structure are gaining increasing importance in digital signal processors (DSP). In this case, data paths are combined to form slices, a signal processing operation in a first slice being carried out independently of the signal processing that is taking place in a parallel manner in a second slice.

20 If operations are carried out in the parallel arithmetic units of these digital signal processors using the SIMD instruction type, the problem arises in the prior art that the algorithms used in this case are often not suited to the parallel signal processing in all of the slices.

In the case of the signal processing in the individual slices, for example, the results obtained can therefore usually be provided only at different points in time or after a different number of processor clock cycles in the respective slice as a result of the respective different algorithms used there.

The system of processing instructions in a manner that concurs with the other SIMD slices either cannot be implemented at all or can be implemented only with a high outlay.

This necessarily high outlay occurs, on the one hand,

in terms of software, as additional programs which are to be executed and organize the different waiting times for the slices in order to provide the results in a parallel manner.

5

10

25

30

This high outlay arises, on the other hand, in the hardware, as heavy processor and memory utilization that reduces the processor performance. This reduction may be averted, for example, by expanding the memory but this signifies an increase in the outlay on hardware.

It proves to be disadvantageous in the prior art that, in order to necessarily adapt the algorithms to the SIMD instruction type during the signal processing, primarily in the slices with their associated data paths, these slices and the additional associated VLIW architecture of the processor have to be supplied, to a considerable extent, with no-operation instructions (NOP).

This not only renders the power-increasing effects of using the SIMD instruction type ineffective but also requires an additional outlay on hardware and software in order to adapt the algorithms.

The formulated object according to the invention is thus to individually adapt the signal processing (when the SIMD instruction type is used) in the individual data paths in a power-efficient manner and, in particular, to minimize the occurrence of NOP instructions with which the VLIW architecture of the processor must be supplied.

35 The formulated object is achieved according to the invention by means of the fact that the parallel signal processing - as a result of the SIMD instructions which are converted by the PCU - of the processor is

individually controlled, in a respective data path (DP) of a first and a second slice, by means of a "single slice halt" state that is output by an SSM register bank for each slice.

5

10

In this case, the controlling effect of the "single slice halt" state that has been output is achieved by the bits (which are assigned to the first and second slices) of the SSM register bank switching the register clock supply via the respectively associated first and second gated clock cells.

As a result, the associated input register and/or accumulator and/or pipeline control register is/are stopped in the meantime depending on the state of the signal processing occurring in the slice of the data path.

This functioning is enabled only by the "single slice 20 halt" state that has been output being discontinued when a further SIMD instruction is converted.

The register file unit (RFU) and the memory access register of the processor remain in operation irrespective of the "single slice halt" state that has been output. The PCU can in this case write to the SSM register bank of the PCU at any time.

This solution is aimed at beginning with the individual calculations in a parallel manner in the slices of the data paths of the processor, in accordance with the SIMD instruction type.

However, as a result of the different calculation processes, the intermediate and/or final results in the slices are provided at different points in time in the pipeline control registers, accumulators and result registers of the associated data paths.

25

30

After the intermediate and/or final result values have been provided, a further signal processing operation that is no longer laden with results is thus prevented in the data paths which are associated with the 5 individual slices.

The signal processing is continued in a parallel manner in all of the data paths of the slices if a start is made on processing a further SIMD instruction. 10

A supplementary embodiment of the solution (according to the invention) of the formulated object consists in controlling the clock supply for the VLIW unit, by means of a software-dictated output of the state from 15 the program flow of the processor, in such a manner that, as a result, partial instruction words which are currently present in the VLIW unit are subsequently provided in the latter for multiple use at the functional units.

This solution according to the invention advantageously becomes effective if necessary adaptation of algorithms to the SIMD instruction type during the signal processing makes it necessary for the data paths and the associated VLIW architecture of the processor to be supplied with no-operation instructions (NOP) or similar instructions with a high repetition rate. In this case, avoiding the generation of identical VLIWs reduces the amount of memory space used and keeps the computing load of the processor low, with the result that the computing power is efficiently available for the important calculations.

variant of advantageous the supplementary 35 One embodiment of the solution according to the invention consists in interrupting the generation of further VLIWs in the VLIW unit by the PCU being informed of a

10

30

VLIW WAIT command via an advance signal line and this command being applied to the PCU in the next clock cycle, the PCU then switching the clock supply for the VLIW unit by means of a "VLIW WAIT" signal line and a third gated clock cell.

This solution is aimed at being able to realize debugging routines in software tests by it being possible to set and start software breakpoints in the program code.

The invention will be explained in more detail below with reference to an exemplary embodiment for outputting a single slice halt state. The figure of the drawing contains a block diagram of the processor, in which the parts with the associated functional units which relate to the solution according to the invention are given.

20 In the event of the "single slice halt" state being output, it is a prerequisite that an SIMD instruction is output by the VLIW unit 2 via the SIMD control bus 12. This individual SIMD instruction triggers multiple data processing in the respective data path 14 of the first and second slices 18; 19.

The results are provided at different points in time in the associated accumulator 8. In this case, a respective bit (which is assigned to the first and second slices 18; 19) of the SSM register bank 13 is set.

The signal allocation of this bit is supplied, via the first and/or second gated clock cell 3; 4, to the data path 14 (that is respectively associated with the first and second slices 18; 19) and individually controls the signal processing in the first and second slices 18; 19 in that the clock supply at the associated input

register and thus also the signal processing are prevented when a result is present in this slice.

When a further SIMD instruction is output on the SIMD control bus 12, for example after the last result worked out in one of the slices has been provided, the respective bit of the SSM register bank 13 is reset and all of the data paths begin the next signal processing operation by reading in the data provided by the RFU 11 at their input registers.

The signal processing in the individual slices of the data paths 14 is thus advantageously adapted to the requirements of parallel processing of the SIMD instructions.

# Method and arrangement for the power-efficient control of processors

### List of reference symbols

| 1  | Processor                                       |
|----|-------------------------------------------------|
| 2  | VLIW (Very Long Instruction Word) unit          |
| 3  | First gated clock cell                          |
| 4  | Second gated clock cell                         |
| 5  | AGU (Address Generating Unit)                   |
| 6  | PCU (Process Controlling Unit)                  |
| 7  | Clock supply line                               |
| 8  | Accumulator                                     |
| 9  | Further processing unit (with gated clock cell) |
| 10 | Register of the further processing unit         |
| 11 | RFU (Register File Unit)                        |
| 12 | SIMD control bus                                |
| 13 | SSM (Single Slice Mode) register bank           |
| 14 | Data path                                       |
| 15 | SIMD data path control line                     |
| 16 | Advance signal line                             |
| 17 | VLIW WAIT signal line                           |
| 18 | First slice                                     |
| 19 | Second slice                                    |
| 20 | Third gated clock cell                          |

### Method and arrangement for the power-efficient control of processors

#### Patent Claims

A method for functionally controlling the program 1. and/or data flow in digital signal processors and processors having respective closed modules which are separate from one another, are intended for 10 program and data flow control and operate in parallel arithmetic units, wherein, as a result of the SIMD instructions which are converted by the PCU (6), the parallel signal processing of the processor (1) is individually controlled, in a 15 data path DP (14) that is respectively associated with the first and second slices (18); (19), by means of a "single slice halt" state that is register bank (13), an SSMoutput by controlling effect of the "single slice halt" 20 state that has been output being achieved by the bits (which are assigned to each slice) of the SSM register bank (13) switching the register clock supply via the respective first and second gated and, as result, 25 clock cells (3);(4)a functioning of the assigned input register and/or accumulator and/or pipeline control register being stopped in the meantime depending on the state of the signal processing occurring in the DP (14) associated with the respective slice and 30 functioning being enabled again only by "single slice halt" state that has been output being discontinued as a result of a further SIMD instruction being converted, and wherein the register file unit (RFU) (11) and 35

the memory access register of the processor (1) remain in operation irrespective of the "single slice halt" state that has been output and the PCU

- (6) can in this case write to the SSM register bank (13) of the PCU at any time.
- A method for functionally controlling the program 2. and/or data flow in digital signal processors and 5 processors having respective closed modules which are separate from one another, are intended for program and data flow control and operate in parallel arithmetic units, wherein the clock 10 supply for the VLIW unit (2) is controlled, by means of a software-dictated output of the state from the program flow of the processor (1), in such manner that, as a result, instruction words which are currently present in the VLIW unit (2) are subsequently provided in the 15 latter for multiple use at the functional units.
- 3. The method as claimed in claim 2, wherein the generation of further VLIWs in the VLIW unit (2) is interrupted by the PCU (6) being informed of a VLIW WAIT command via an advance signal line (16) and this command being applied to the PCU (6) in the next clock cycle, the PCU (6) then switching the clock supply for the VLIW unit (2) by means of a "VLIW WAIT" signal line (17) and a third gated clock cell (20).

